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ABSTRACT 



A data storage includes multiple disk units accessible to 
multiple processors/servers. The multiple disk units include 
a master disk unit and a one or more data -mirroring disk 
imits. A data-mirroring disk unit is assigned to a correspond- 
ing ones of the multiple servers by one of the processors 
designated as the mount manager. Data is written by the 
processors to the data storage is written to the master disk 
unit, and copied by the data storage to the data-mirroring 
disk units. Data is read by each of the processors from the 
data-mirroring disk unit assigned to such processor. 

17 Claims, 10 Drawing Sheets 
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MULTIPLE PROCESSOR DATA 
PROCESSING SYSTEM WITH MIRRORED 
DATA FOR DISTRIBUTED ACCESS 

BACKGROUND OF THE INVENTION 

The present invention relates generally to data processing 
systems, and more particularly to a method that distributes 
multiple copies of data across multiple disk drives of a 
storage system for improved and parallel access to that data 
by multiple processors. 

There are many factors that can operate against optimum 
performance of a data processing system. One such factor 
stems from the relative disparity between the time it takes to 
perform a data access (e.g., read or write) of a peripheral 
storage of a data processing system and the operating speed 
of a data processor making that access. This disparity is 
made more evident with today's penchant for clustered 
systems in which most, if not all, of the multiple processors 
of the system compete for access to the available data 
storage systems. Unfortunately, the storage systems in these 
and other multiple processor environments tend to form a 
bottleneck when being accessed by several of the processors 
of the system at the same time. The problem is worse with 
poor storage system design that makes it difficult for the 
storage system to handle multiple, simultaneous input/ 
output Q/O) requests, severely impacting system perfor- 
mance. In addition, poor storage system design can create an 
environment that gives rise to possible irreparable loss of 
data. 

Among prior solutions are those used using data redun- 
dancy to both backup the data, protecting against loss, and 
to allow parallel access for improving system perforaiance. 
Such solutions include redundant arrays of independent (or 
inexpensive) disks (RAID). There are various RAID con- 
figurations or levels, some using data striping (spreading out 
blocks of each file across multiple disks) and error correc- 
tion techniques for data protection, but redundancy is not 
used. Thus, although these RAID configurations will tend to 
improve performance, they do not deliver fault tolerance. 
However, data redundancy is used by a RAID level (RAIDI) 
that employs disk mirroring, thereby providing redundancy 
of data and fault tolerance RAIDI is a well known technol- 
ogy to increase the I/O performance. Typically the disk 
mirroring employed by RAIDI incoqiorales a group of 
several disk drives, but provides a single disk drive image to 
servers. 

Storage systems employing a RAIDI architecture will 
usually limit read/write outside accesses to a master disk 
drive. When an I/O write request is received by a RAIDI 
storage system, the data of the request is written to the 
master disk. A disk controller of the storage system will then 
handle replication of that data by writing it to all of the 
mirrored disks. The end result is that each and every disk of 
the storage system will have the same data. 

When An I/O read request is received, a disk selector 
module, typically found in the disk controller, will select one 
of the mirrored disks to read in order to balance the loads 
across the disk drives of the system. A disk controller is 
capable of reading data from multiple disk units in parallel. 
This is why the disk mirroring increases the performance of 
data read operations. 

But this technology has at least two problems. First, 
processor elements of the system can be subjected to high 
loads which restricts the number of I/O requests which the 
disk controller can process in a period of time. Second, when 
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an I/O write request is received by the storage device, the 
requesting system element (e.g., a processor) must wait for 
a response until the disk controller writes the data to all the 
disk drives. This can introduce latency in data write opera- 
5 tions. 

SUMMARY OF THE INVENTION 
Broadly, the present invention relates to a method of 
allocating each of a number of processor units to a corre- 
sponding one of a number of disk storage units. In this way, 
each processor unit can read data from its allocated disk 
storage unit with minimum conflict to other read and/or 
write operations conducted at or about the same time by 
other processor units. Multiple, simultaneous accesses for 
data will not create or encounter a bottleneck. In addition, 
the redundancy produced by this approach provides a stor- 
age system with fault tolerance. 

The invention, then, is directed to a processing system 
that includes a number of processor elements connected to 

2Q disk storage having a plurality of disk storage units for 
maintaining data. One of the processor elements, designated 
a "Mount Manager," is responsible for assigning a disk 
storage unit to a corresponding one of the other processor 
elements so that, preferably, there is a one-to-one corrcspon- 

25 dence between a disk storage unit and a processor element. 
One of the disk storage units is designated a master disk unit, 
and the remaining disk storage units arc designated "mir- 
rored" disk units. A disk controller of the storage system 
controls the writing to and reading from the disk storage 

3Q units. The disk controller receives I/O write requests from 
the processor elements to write the data of that request only 
to the master disk unit A sync daemon running on the disk 
controller copies the written data to the mirrored disk units. 
Each of the processor elements issue I/O read request to, and 

35 read data from, the mirrored disk unit assigned to it by the 
Mount Manager. If, however, the I/O read request is issued 
before the allocated mirrored disk unit has been updated 
with data recently written to the master disk unit, the 
requested data will be read from the master disk unit. To 
detect such a situation, the disk controller and the sync 
daemon use a bitmap status table that indicates which disk 
block in each mirrored disk drive has a stale data or updated 
data. 

In an alternate embodiment of the invention the mirrored 

45 disks are not updated immediately. Rather, data written to 
the mirrored disks are fixed as of that point in time they are 
updated. Changes to that data on the master disk unit are not 
immediately written to update the mirrored disks until a 
processor element issues a "SNAPSHOT' request to the 

50 Storage system. At that time the sync daemon of the disk 
controller will determine which data needs to be written to 
the mirrored disk units for updating, and identify them. 
Then, the sync daemon will update those mirrored disk 
storage units needing updating. In addition, when data is 

55 proposed to be written to the master disk unit, the disk 
controller first checks to see of the data that will be over- 
written has been copied to the mirrored disk units. If not, the 
data that will be over- written is first copied to the mirrored 
disk units before being changed. 

60 A number of advantages are achieved by the present 
invention. First is that by providing redundant data by 
mirroring the content of die master disk unit and assigning 
specific ones of the mirrored disk units to corresponding 
ones of the processor elements, parallel read accesses may 

65 be made, thereby improving system operation. 

These and other advantages of the present invention will 
become apparent to those skilled in this art upon a reading 



06/02/2003, EAST Version: 1.03.0002 



us 6,542,962 B2 

3 4 

of the following description of the specific ernbodiments of confusion from unnecessary conaplexity, it will be apparent, 
the invention, which should be taken in conjunction with the and in some instances preferable, lo have more than one 

accompanying drawings. Mirroring Group. If more than one Mirroring Group is used, 

those implementing mirroring according to the present 

BRIEF DESCRIPTION OF THE DRAWINGS 5 invention will have one disk storage unit designated as the 

^ . , , . , master disk storage unit, comparable to the master disk 

FIG. 1 IS a block diagram broadly lUustratmg a data storage unit 20, of the Mirroring Group GOl, and one or 

processmg system mcorporating the present mvention; more mirrored disk storage units comparable to the disk 

FIGS. 2-7 illustrate data structures maintained by the storage units 20. The following discussion will refer to more 

various elements of the system illustrated in FIG. 1 to than one Mirroring Group to show how the disk storage units 
primarily track fresh and stale data on the mirrored disk of two or more Mirroring Groups are managed by the 

units; storage system 14. 

FIG. 8 is a flow diagram that illustrates the steps taken to The disk storage units 20 are controlled by a disk con- 
assign a one of the disk mirrored units to a server processor troller 22 that communicatively connects lo the disk storage 
for read operations; 15 units 20 by an I/O bus 24. Although not specifically shown, 

FIG- 9 is a flow diagram illustrating operation of the ^^^^ appreciated by those skilled in this art that the disk 

Mount Manager* controller 20 will include the necessary processor elements 

FIG. 10 is a flow diagram iUustraling Ibe sleps taken to ™"°PfO=«sors) and associated devices (e g 

failover a disk unit that has been found by a server processor ""T'^} '"1^ T^IT^' ?u '° 

to have failed* j f 20 '^^^^ requests sub milled by the server processors 

. ' . 12. As will be seen, the disk controller, with the help of the 

FIG. 11 IS a flow diagram that illustrates the steps taken ^ount Manager 12,, manages the data that is to be written 

to shut down a server processor; ^o and read from the disk storage units 20. All I/O write 

FIG. 12 is a flow diagram illustrating the steps taken by requests are honored by first writing that data to the master 

the storage system of FIG. 1 when an 1/0 request is received; ^ storage disk 20, and then copying that same data to the 

FIG. 13 is a flow diagram that iUuslrates the steps taken mirrored disk storage units 20^, ...,203, thereby providing 

by the storage system to perform a write operation; multiple copies of data for ready and parallel access to the 

HG. 14 is a flow diagram broadly illustrating the steps ^^^^^ processors 12. 

taken by the sync daemon to maintain copies of data written The Mount Manager 12, is responsible for establishing 

to the master disk storage unit of FIG. 1 to the mirror disk 30 the Mirroring Group, or Mirroring Groups as the case may 

storage units; i° response to supervisory (i.e., human) input. That input 

HG. 15 is^the Mirror Group Status Table for Split mode "^'^ ^^^''^^^f^^^ fashion (i e., through a keyboard or 

of operation of an embodiment of the present invention; and other mput device, or combmation of input devices 

inno liTA iro A Acr> ii . /*u u j * ^" apphcation program, to construct appropriate data 

FIGS. 16A, 16B, and 16C illustrate the changes made to ot«.^#»r^A t« ..^^.-tL^ .u- , i'> i n 

.u . c. / T ui * a * u fj* 35 structures). In addition, the Mount Manager 12, also allo- 

the Data Status Bitmap Table to reflect changes of data on ^^^^ ^ ^^^^ ^^^^^^^ ^^-^ ^0 to each of the server processors 

the masler disk storage umt. ^ example, it may allocate mirrored disk storage unit 

DESCRIPTION OF THE SPECIFIC lo server processor 122 mirrored disk storage unit 

EMBODIMENTS server processor I23, or vice versa. However, as 

40 indicated above, although the storage system 14 stores data 

Turning now to the Figures, and for the moment specifi- on the disk storage units 20 in replicated form, data is written 

cally FIG. 1, there is illustrated a data processing system, first only to the master disk storage unit 20,. That data is 

generally identified with the reference numeral 10, that subsequently copied to the mirrored disk storage units 20 of 

comprises a number of server processors 12, including one that Mirroring Group, e.g., mirrored disk storage units 2O2 

(server processor 12,) that serves as a "Mount Manager/' 45 and 2O3 (for Mirroring Group GOl) only after written to the 

The server processors 122, • •,123 are communicatively master disk storage unit 20,. 

interconnected to the Mount Manager 12, by a network Each server processor 12 will be provided the address of 

structure 11 which may be, for example, a local area network the Mount Manager 12, by conventional methods, such as 

architecture such as Ethernet using a TCP/IP protocol, or a by pre-configured information in a local file system or by 

fiber channel architecture. access to a network information service (NIS), a centralized 

In addition, the Mount Manager 12, and server processors database on an intranet (e.g., the network structure 11). 

122, . - . , 123 arc connected lo a storage system 14 by an Initially, such as when a server processor 12 first boots 

communicative interconnections 16, which may be part of and is initialized, it will send a "Mount Point Request" to the 

the same network architecture as the network structure 11, a Mount Manager 12„ in effect, applying for assignment of a 

separate network, or individual connections such as a fiber 55 disk storage 20 for I/O read requests. In response, the Mount 

channel architecture using a small computer system inter- Manager 12j will allocate one of the disk storage units 20 to 

face (SCSI) protocol. The storage system 14 is shown as the requesting server processor 12. In this manner the I/O 

including a "Mirroring Group" GOl, comprising disk stor- read request load imposed upon the storage system 14 by the 

age units 20, including a master storage disk unit 20i and server processors L2 is distributed across the disk storage 

mirrored disk storage units 2O2, . . . , 2O3. It will be evident go units 20. Also, each of the server processors 12 will have 

to those skilled in this art that the number of disk storage resident a file system process 13 and a mount daemon 

units 20 can be anything appropriate within the design and ("mountd") 15. Tbe file system process 13 is used by each 

operating capabilities of the storage system 14. server processor 12 to "mount" (i.e., initialize, elc.) the disk 

Disk storage units 20 are preferably grouped in "Mirror- storage unit 20 that has been allocated that server processor, 

ing Groups." The disk storage units 20 are shown as having 65 The mount daemon, mountd, 15 is used by a server process 

membership in the Mirroring Group GOl. And, while only 12 to query the Mount Manager 12^, for the identification of 

one Mirroring Group is illustrated in FIG. 1, to preclude the disk storage unit lo mount through the Mounl Point 
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Request. Also, if a mirrored disk storage 20 unit fails, the processor 12^ has been allocated the services of the mirrored 

server processor 12 to which that now-failed disk storage disk storage unit lO^. 

unit has been allocated, will use the mountd to request The Disk Status Table 34, shown in FIG. 4, provides 

allocation of replacement disk storage unit 20. The file information of the availability of each disk storage unit 20 

system process 13 also operates to process file level I/O 5 of a Mirroring Group. The "Disk Name" column 34„ iden- 

requests issued by application programs running on the tifies the disk storage unit, and the "Available?" column 34^ 

server processor — as is conventional, and in conventional ' identifies its status, i.e., availability. Thus, FIG. 4 illustrates 

fashion. The file system process 13 translates a file level I/O the situation in which one of the disk storage imits 20, unit 

requests from application programs for retrieving the 2O3, has failed, or has been removed from the storage system 

requested data from the allocated mirrored disk storage unit jq 14, and is therefore identified and being unavailable by the 

20. "No" in column 34^,. The mountd of each server processor 

Data normally is read only from the that mirrored disk ^ ^^1*' when detected, report failure of the allocated disk 

storage unit 20 assigned or allocated to the server processor storage unit 20 to the Mount Manager 12 ^. If an adminis- 

12 issuing the I/O read request. However, if the requested ^^^^^^ ^y^^^^ ^^^^^ ^^P^^^ ^^^^^ storage unit 

data has changed on the master disk storage unit 20, before ,5 ^^^^ ^"^/f"^ '^^^l""^ ^" storage system 14, the disk 

the mirrored disk storage unit to be read has been updated to ^^^^^ be updated by the admmislrator manually to 

reflect that change, it will be the master disk storage unit 20, ifrV^l'^^ .'T^^.u^^'^^l'^ f f * ^ 

j.u -lui L FIG. 4 further illustrates, the Disk Unit Status Table 34 

that IS accessed for that data. In order to have available such .u„„„ «t, ^ - 1 . -tn ^ -»n ^ 

/i\ *u -J f *u . J- 1 . ■ shows the disk storage units 20 J and 2O2 as up and running, 

mformation(l) as the identity of the master disk storage unit *^ available j ^ r &) 

in order to be able to distinguish it from the mirrored units, ' Turning now to FIG. 5, there is shown a Mount Point ID 

or (2) to be able to delermine which disk storage units have ^able 36 Each server processor 12 maintains a Mount Point 

rnembership m which Mirronng Group (if there are more jable 36 for identifying which disk storage unit 20 has 

than one), or (3) to be able to identify which mirrored disk b^cn allocated that server processor 12. For example. The 

storage unit is assigned to which server processor 12, or (4) Mount Point ID Table 36 is what would be maintained by the 

to track the freshness of data on the mirrored disks 20 a 25 server processor 122, showing (in agreement with the Mount 

number of data structures arc created and maintained by the Points Table 32, maintained by the Mount Manager 12 J that 

server processors 12 and the storage system 14. the disk storage imit 2O3 has been allocated. The server 

Accordingly, turning now to FIGS. 2-4, there are shown processor I23 would have a similar Mount Point ID Table, 

three data structures: a Mirroring Group Table 30 (FIG. 2), showing that it had been assigned disk storage unit 2O2. 

a Mount Points Table 32 (FIG. 3), and a Disk Unit Status 30 F^G. 6 is a Data Status Bitmap Table for mirrored data that 

Table 34 (FIG. 4) that are created and maintained by the is created and maintained by the storage unit 14. The Figure 

Mount Manger 12 The Mirroring Group Configuration assumes there arc two Mirroring Groups (Mirroring Group 

Table 30, shown in FIG. 2, identifies each mirroring group GOl of FIG. 1 and the hypothetical Mirroring Group G02) 

of the storage system 14 as established by the Mount for purposes of illustration, rather than just the one shown in 

Manager 12 including the makeup of that minoring group, 35 FIG. 1. Beginning at the far left of FIG. 6, the first (left most) 

i.e., the number of disk storage units, their addresses, and column 40 of the bitmap identifies the Mirroring Groups 

which is designated as the master and which are the minored within the storage system 14: Here, there arc only two 

units. Thus, as FIG. 2 illustrates, column 30^, labeled mirroring groups identified: Mirroring Groups GOl and 

"Group ID," identifies each Mirroring Groups established G02. Moving to the right, the next column 42 identifies, for 

for and managed by the storage system 14 (FIG. 1). Here, 40 each mirroring group, the disk storage units within the 

there is shown the identification of the Mirroring Group corresponding mirroring group. The next column 44, immc- 

GOl, shown in FIG. 1, and if a second Mirroring Group is diatcly to the right, serves to label the rows that extend to the 

established for the storage system 14 (as assumed here for right, for example rows 46 and 48, corresponding to 

illustrative purposes), its identification, G02. To the right are "DiskOl" in column 42 and rows 50, 52, corresponding to 

additional columns, 30^, • ■ ■ y 30^, identifying the disk 45 "Disk02" in column 42. 

storage units of the Mirroring Group or Groups and their The Data Status Bitmap Table 38 of FIG. 6 is a data 

designations. Thus, the column "Master Disk" (30^,) identi- structure that provides information as to whether or not data 

fies the master disk storage of Mirroring Group GOl as written to or otherwise modifying that held by the master 

"Disk 20 J " the columns 30^, "Mirrored Disk 1," 30^, disk storage unit 20 j has been copied to the mirroring disk 

"Mirrored Disk ( )," and 30^, "Mirrored Disk 3 ( )," etc., 50 storage units lOj and 203. For the master disk storage unit 

identify the mirrored disk storage units of the Mirroring 20^, which has an address of "DiskOl," the row 46 identifies 

Group GOl as disk storage units lOj and 2O3, indicating also each data storage block of the disk, and the row 48 identifies, 

that there is no "Mirrored Disk 3 for that Mirroring Group. for each block, whether all corresponding mirroring blocks 

In addition the Mirroring Group Table 30 shows the makeup have been updated; that is, if data in Disk Block 3 has been 

of a Mirroring Group G02 (shown here for illustrative 55 rewritten or otherwise modified, that block will need to be 

purposes only; not shown in FIG. 1) as including a master copied to the corresponding Disk Block of the mirroring 

disk storage unit identified as DISK 23, and three mirrored disk storage units 2O2 and 2O3. Accordingly, if the data held 

disks identified as DISK 24, DISK 25, and DISK 26. by Disk Block 1 of the master disk storage unit 20i has at 

The Mount Points Tkble 32 (FIG. 3) provides the infor- some time been changed, the "Y" in the "Updated" row for 

mation as to which disk storage unit 12 has been assigned to 60 Disk Block 1 indicates that the change has been copied to the 

which server processor 12 for the particular Mirroring mirroring disk storage units 2O2 and 203. Conversely, the 

Group. If there are more than two Mirroring Groups, there "N" for Disk Blocks 2 and 3 and 5-9 indicate that data in 

would be a separate Mount Points Table for each such group. those disk blocks of the master disk storage unit has 

FIG. 3 illustrates the Mount Points Table for the Mirroring changed, and that change has not yet been completely 

Group GOl, showing that the server processor 122 (server 65 reflected at the mirroring disk storage units 2O2 and 2O3. 

column 32^) has been allocated use of the mirrored disk Rows 50, 52 show the status of data stored on the disk 

storage unit 2O3 (Mount Point column 32^,), and that server storage device of Mirroring Group GOl with the address of 
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"Disk02/' i.e., disk storage unit 20^. Thus, as FIG. 6 indicating that the disk storage unit 20 allocated the request- 
illustrates by rows 50, 52, the disk storage unit 2O2 has ing server processor has failed or is otherwise no longer 
"stale" data in Data Blocks 3 and 4. All the other data blocks available. For this type of request, step 72 is exited in favor 
have data that has been synchronized with that held in the of stop 80 where the Mount Manager 12 ^ will first change 
corresponding disk blocks of the master disk storage unit 5 the Disk Unit Status Table (FIG. 4) so that it reflects loss 
20 J. TTie remainder of the Data Status Bitmap Table contains and, therefore, unavailability of the disk storage unit 20 in 
similar information for the disk storage unit 2O3, as will for question. Then, in step 84, using the Disk Unit Status Table, 
the disk storage units of the hypothetical Mirroring Group the Mount Manager will select another disk storage unit 20 
G02 (in which the disk storage unit having an address of from those identified by the Table as being available for 
DISK23 is designated the master). As will be seen, the allocation to the requesting server processor 12. In step 86, 
information provided by the Data Status Bitmap Table is the Mount Points Table (HG. 3) is modified by the Mount 
used when an 1/0 read request is received by the storage Manager to reflect this new allocation. Finally, in step 
system 14 to determine if the requested data is fresh, or 88, the Mount Manager will return the identification of the 
should be read from the master disk storage unit 20 J, which allocated disk storage unit 20 to the requesting server 
will always have the most up-to-date data. processor 12, and returns to step 70. 

HG. 7 shows a Mirroring Group Status Table 34 that is At the server processor end, the failovcr process is con- 
also maintained by the storage unit 14. A Mirroring Group ducted as broadly illustrated in FIG. 10. As shown, a server 
can have one of two status: "Mirrored" or "Split" The processor 12 will get its first indication of a problem with its 
Mirrored and Split status pertains to whether or not data has allocated disk storage when, at step 90, an error message 
been "fixed," a term that is pertinent to an embodiment of the from the file system, indicating that an error has been 
invention described below. Basically, if the data has been received in connection with an I/O read request. The error 
fixed at a particular time T, then the server processors 12 are message will further indicate that the allocated disk storage 
unable to read that data if it has been undated subsequently. unit 20 has failed. If such an error is received, the receiving 
They can, however, read data updated before the time T server processor 12 will send a failover message to the 
When there has been an update of data after the time T, the ^ Mount Manager 20^ in step 91, and, in step 94, wait for the 
status of the associated mirroring group is referred to as response from the Mount Manager 12^ that will contain the 
"Split." Conversely, a non-Split mirroring group is mirrored, name/address of the newly allocated disk storage unit 20 
i.e., since data carried by the master disk storage unit 20j has (sent in step 88 of the Mount Manager process^FIG. 9). 
been copied to each of the other disk storage units 20^, 2O3 When that response is received with the identification of the 
of the mirroring group, any server processor 12 can access newly-allocated disk storage unit 20, replacing the one that 
the same data stored on the master disk through any mirrored failed, the server processor will modify its own Mount Point 
disk storage unit. information (the Mount Point ID Table— FIG. 5) and send 

Turning now to FIG. 8, illustrated in flow diagram form the local file system a message with the identification of the 

are the major steps taken by a server processor 12 during its newly allocated disk storage system is steps 96 and 96, 

boot period when coming on-line. As FIG. 8 shows, among 35 respectively. 

the first steps taken is step 60 in which the server processor Returning to the Mount Manager process of FIG, 9, if the 

sends a Mount Point message to the Mount Manager and, in request is detennincd, in step 72, to be an "Unmount" 

step 62, waits for a response. The Mount Manager wUl pick request, the server processor 12 sending the request is, in 

one of the mirrored data storage units 20, and return the effect, asking that its allocated disk storage unit 20 be 

address of that data storage device, in step 64, to the ^ de-allocatcd. The purpose of these scries of steps (i.e., steps 

requesting server processor 12. 102-104 that handle the Unmount request) is to free up the 

FIG. 9 illustrates the operational steps taken by the Mount disk storage unit so that it can be allocated to another server 

Manager 12^ insofar as the present invention is concerned. processor if need be, thereby distributing I/O read loads 

As FIG. 9 shows, the Mount Manager 12, will wait, at step across all disk storage units of the particular mirroring 

70, until it receives a request from one of the other server 45 group. Thus, in step 102, the Mount Points Table (FIG. 3) is 

processors 12. When a request is received, it is checked, in modified to delete reference to the server processor and its 

step 72, to see what the type, i.e., is it (1) a Mount Point connection to the allocated disk storage unit 20. Finally, is 

request sent by a server processor to have one of the data step 104, the Mount Managers sends a message in response 

storage units allocated to it for I/O read operations; (2) a to the Unmount request to notify the requesting server 

failover request, or an "Unmount" request. Failover requests 50 processor 12 that the unmount has been completed, 

may be sent to inform the Mount Manager that the allocated In connection with the unmount request sent to the Mount 

disk storage unit 20 has failed, requesting to have another Manager, the server processor sending the request perform 

allocated. An Unmount request is part of a shutdown process the steps iUustrated in FIG. 11, beginning with step 110 in 

performed by a server processor when it is going or is being which the server processor in question will unmount the file 

taken off-line. 55 system. Next, at step 112, a mountd process running on the 

If the request is a Mount Point request, step 72 is exited server processor 12 in question will send an "unmount" 

in favor of step 74, where the Mount Manager 20^ first request to the mount manager processor 12, (FIG. 1). In 

determines which disk storage units 20 arc available, and response the mount manager processor 12, will modify the 

then chooses one as the "Mount Point" for allocation to the mount point table (sec step 102, FIG. 9, discussed above) 

requesting server processor 12. Then, in step 76, the Mount eo and return to the server processor a reply with a shut-down 

Manager 20^ will update the Mount Points table (FIG. 3) to instruction. The server processor 12 will, in step 114, wail 

have it reflect that allocation, and in step 78 send the for the reply to the unmount request sent, and when received 

identification of the allocated disk storage unit to the the server processor will leave step 114 to shut down in step 

requester server processor 12. The process then returns to 116. 

step 70 to await another request. 65 FIG. 12 illustrates the steps taken by an I/O request 

If, on the other hand, the Mount Manager 12, receives a handling process of the storage system 14 in response to 

Failover Request from one of the server processors 12, requests for disk operations such as I/O read and write 
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requests. The steps illustrated in FIG. 12 are performed by 
the disk controller 22, and begin with step 120 when an I/O 
request is received, moving the process to step 122 where a 
determination of which of three requests have been received: 
read, write, or "snapshot." The snapshot request is dbcusscd 
further below in connection with a second alternate embodi- 
ment of the invention. An I/O read or write request will 
identify, by disk address and block identiiication, where the 
data is to be read from or written to. An I/O write request will 
also contain or be accompanied by the data to be written. I/O 
read requests identify the disk storage unit allocated the 
requesting server processor, and arc transferred to step 124 
where, using the address of the requested data, the Data 
Status Bitmap and Mirror Group Status Tables 38 and 56 are 
consulted to determine first (from the Mirror Group Status 
Table) whether the Mirroring Group containing the request- 
ing server processor is in the "Mirrored" or "Split" state. The 
Split state of a Mirroring Group is discussed below in 
connection with explanation of the alternate embodiment of 
the invention. For now, we will assume that the requesting 
server processor 12 is a member of a Mirroring Group whose 
status is mirrored. 

Thus, after checking the Mirror Group Status Table 56 
and determining the status of the Mirroring Group as being 
mirrored, the Data Status Bitmap Table 38 is consulted to 
determine whether the data requested is in an updated state, 
or if it is stale. For example, referring for the moment to FIG. 
6, assume that the address of the data to be read is identified 
as being contained in mirroring group GOl, Disk02, Disk 
Block 2. As FIG. 6 indicates in row 52, there is an "N/* 
identifying that the requested data is not stale, and, therefore, 
step 124 (FIG. 12) will be exited in favor of step 126 where 
the data is read from the identified disk storage unit 20 and, 
in step 128, transferred to the requesting server processor 12. 
The request handling process then concludes with step 130. 

On the other hand, assume the address of the requested 
data is mirroring still mirroring group GOl, Disk02, but now 
Disk Block 3. As the Data Status Bitmap Table 38 of HG. 
12 indicates by the "Y" for that address, the data is stale. 
Accordingly, this time step 124 will be exited for step 127 
where the requested data is read from the master disk storage 
unit of that mirroring group (i.e., GOl), and, in step 128, 
transferred to the requesting server processor 12, again 
concluding with step 130. 

Assume now that the request received in step 120 is an I/O 
write request. This time step 122 will transfer the request to 
step 140 where a Data Write Sequence (described below) is 
called, followed by the concluding step 130. 

The major steps taken for the Data Write Sequence is 
broadly illustrated in FIG. 13. The Sequence begins with 
step 142, when the call (e.g., as may be made by step 140 of 
the disk controller process; FIG. 12), together with the I/O 
write request, is received. The request is transferred to step 
144 where, using the identification of the mirroring group 
containing the disk storage unit to be written, the Mirror 
Group Status Table (FIG. 7) is consulted to determine the 
state of the mirroring group i.e., whether a mirrored or a 
Split state. If in a mirrored stale, step 144 leads to step 150; 
if not, step 144 will transfer the request to step 146. 

Assume the disk storage unit to be written is in mirroring 
group GOl which, as the Mirror Group Status Table of FIG. 
7 indicates, is in the mirrored state. Accordingly, the deter- 
mination made in step 144 will lead to step 150 where the 
data of the request is written to the master disk storage unit 
of the identified mirroring group, here, disk storage unit 20j. 
Then, in step 152, the Data Status Bitmap Table (FIG. 6) is 
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updated to reflect the newly-written data by setting the bit 
for the written disk block of the master disk (identified as 
DiskOl in FIG. 6) to a state that specify the update with a 
"Y." Next, in step 152, the corresponding disk blocks 

5 containing mirrored data for the other mirror disk storage 
units (e.g., here disk storage units 2O2 and 2O3) are set to a 
state ("N") to reflect that the particular disk block does not 
match the corresponding disk block of the master disk 
storage unit of that minoring group. 

10 To illustrate, assume that Disk Block 1 of DiskOl, mir- 
roring group GOl was written in step 150. The "Updated" bit 
for Disk Block 1 (Disk 01, mirroring group GOl) is set to a 
"Y" state to mdicate that update. Then, in step 154, the 
"Stale** bits for the corresponding Disk Blocks of the mir- 

15 roring disks (Disk02 and Disk03) are set to "N" to indicate 
that they now contain stale data needing updating. 

Running in the background on the disk controller 22 is the 
Sync Daemon 26 (FIG. 1), which periodically checks the 
Data Status Bitmap Table to see if the mirrored data matches 

^ that of the master disk storage unit of each mirroring group. 
Thus, ultimately, after the above described write, the Sync 
Daemon 26 will check Data Status Bitmap Table to find that 
the "Updated bit for Disk Block 1 of DiskOl (mirroring 
group GOl) indicates that the data was updated, and that the 

^ corresponding mirrored Disk Blocks, being set to "N," need 
updating. Accordingly, the Sync Daemon will write the data 
(which preferably has been cached) to the Disk Blocks 1 of 
the mirrored disk storage units, and reset the bits to a "Y" to 
indicate they no longer need updating, and that the data there 
matches the corresponding data on the master disk storage 
unit of that mirroring group. 

The Split state of a Mirroring Group has to do with the 
alternate embodiment of the present invention, which limits 

2^ access to the master disk storage unit 20i even in instances 
when the master disk storage unit 201 carries data more up 
to date than that of the mirrored disk storage. To understand 
the Split state, assume that the Mirroring Group GOl is in a 
Split state, rather than mirrored, state. This is Hlustraied by 

^ the Mirroring Group Status Table 200 shown in FIG. 15. 
(FIG. 15, and the remaining FIGS. 16A— 16C discussed 
below refer only to a single Mirroring Group, GOl, and show 
that Mirroring Group as containing only two disk storage 
units 20: the master disk storage unit 20^ and a mirror disk 
storage unit 20-, with respective addresses identified as 
"Disk 01" and "Disk 02.'* The purpose of this is to refrain 
from unduly complicating the discussion of this second 
embodiment of the invention.) 

FIG. 16A illustrates a Data Status Bitmap Table 210a the 

50 represented system in some initial state, showing the mir- 
roring group GOl as including two disk storage units: the 
master disk storage unit 20, and the mirrored disk storage 
unit 2O2. Also, the Data Status Bitmap Table 210fl indicates 
that the data carried by the mirrored disk storage unit is 

55 assumed to be "fixed," i.e!, the data is valid and can be used 
for responses to I/O read requests for that data. The Data 
Status Bitmap Table 210a further indicates that the disk 
Blocks I and 2 of the master disk storage unit (Disk 20^) has 
not been updated since being mirrored at Disk Blocks 1 and 

6Q 2 of the mirroring disk storage unit (Disk 20^. How the 
storage system 14 "fixes" mirrored data will be discussed 
below in connection with the storage system's response to a 
Snapshot request from a server processor 12. 

Now, assume that one of the server processors 12 sends an 

65 I/O write request to the storage system 14 for data to be 
written to Disk Block 1 of the master disk storage 20^. 
Referring for the moment to FIG. 12, steps 120 and 122 will 
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find that the received request is one for writing data, and pass However, before the Disk Block 1 of the mirrored disk 
the request to step 140, which calls the data write sequence storage unit 202 is updated, suppose an UO read request is 
shown in FIG. 13. Then, as FIG. 13 shows, the call is received, requesting mirrored data from Disk Block 1, 
received by step 142, passed to step 144 where the controller address Disk 2O2? When the I/O read request is received, as 
22 examines the Mirror Group Status Table 200 (FIG. 15) 5 FIG. 12 shows, the disk controller will see that the request 
and sees that the Mirroring Group containing the disk is a read request and, from step 122, pass the request to step 
storage unit to which the request is directed is in a Split state. 124. In step 124, the disk controller will consult the Data 
Accordingly, the request is passed to step 146 where the Status Bitmap Table 200c (FIG. 160) and sec that, by the Y, 
Data Status Bitmap Table 210a (FIG. 16A) is checked. that the requested data is stale. Therefore, as was done above 
Seeing that the data then held at Disk Block 1 is mirrored in connection with the Mirrored state of the Mirroring Group 
(i.e., by the "N" in the updated box for Disk20i to indicate GOl, the request will be passed to step 127 to read the 
that the data has not been updated recently, and the "N" in requested data from the master disk storage unit20i, i.e., the 
the corresponding Disk Block for the Disk 20i to indicate updated data stored at Disk Block 1, address Disk 01. 
that the corresponding data is not stale), step 146 is left in Consider now the situation involving an update of the 
favor of step 170 where the data is written to Disk Block 1 master storage unit 20, before the mirrored disk storage can 
of the master disk storage unit 20, Then, in step 172, the be updated with the prior new or modified data. That is, 
"updated" bit in the Mirror Group Status Table 200 is assume data at Disk block 1 of the master disk storage unit 
changed to a "Y" to indicate that data has been written, but 20^ is re-written or otherwise modified, but before a Snap- 
not yet mirrored. shot request is received, another 1/0 write request is received 
As a result of this write operation, the state of the new ^ to again update that same data. This is the situation existing 
Data Status Bitmap Table, after step 172, is changed as to with the Data Status Bitmap Table 2006 (FIG. 16B) or 200c 
that shown in FIG. 16B. As can be seen, the field for Disk (FIG. 16 C). Given cither of these situations, when an I/O 
20 1, Disk Block 1, is set to a "Y," indicating that the data in write request is received to write data to the Disk Block 1 of 
that block has changed or been modified. That, together with the master disk unit 20,, the request will first be handled by 
the "N" in the Disk 20,, Disk Block 1 field, indicates that ^ steps 120, 122, and 140 of the Disk Controller Process (FIG. 
even if the data carried by the master disk storage has been 12), as described above, to make a call to the Disk Write 
updated, the corresponding space on the mirrored disk Sequence shown in FIG. 13. 

storage is different, but still valid. The Disk Write Sequence will determine in steps 142 and 

Next, assume that the disk controller 14 receives an I/O 144, and with reference to the Mirror Group Status Table 

read request from one of the servers 12, requesting data 30 200, will see that the Mirroring Group to which the request 

stored on mirrored disk. Disk 20^, Data Block 1. Returning is directed is in the Split state. And, in step 146, a check of 

to FIG. 12, steps 120 and 122 will pass the request to step the Data Status Bitmap 210b (FIG. 16B) for 210c (FIG. 16C) 

124. There, the process will determine that the requested will show that the mirrored data has not yet been updated, 

data is still indicated as being not stale, i.c., it is valid, by the Accordingly, before the data of the most recent request is 

"N" in the Staled field of FIG. 16B for Disk 02, Disk Block 35 written, the disk Write Sequence will proceed to step 160 

1. Thus, the requested data will be read and passed to the where the data that will be over-written by the recent request 

requesting server processor 12. In fact, this is a "fixed" state, is read from the master disk storage and, in step 162, copied 

as will become apparent below. to each mirrored disk storage unit (here. Disk Block 1 of disk 

Assume now that the I/O read request is followed by a storage unit 20^ requiring updating. Then, in step 164, the 

Snapshot request being issued by one of the server proces- 40 Data Status Bitmap 210^ to 210c, as the case may be, is 

sors 12 to the storage system 14. The disk controller 22, updated to reflect that the mirrored data is updated, 

again m steps 120, 122 {FIG. 12), will check the request, find The data of the received request is then written to the 

that it is a Snapshot request, and pass it to step 134 to master disk storage unit 20 j (step 166), the corresponding 

execute a call, with the request, to the sync daemon. The filed of the Data Status Bitmap for the master disk storage 

sync daemon will, as illustrated in FIG. 14, receive the 45 set to indicate once again that the master disk storage has an 

request in step 180, sec that the request is through a call from update that is not reflected in the mirrored storage, and the 

the disk controller 22, and pass the request to step 190, Sequence ends with step 169. 

where it is determined that it is a Snapshot request. In conclusion there has been disclosed a storage system 

Accordingly, the sync daemon operation will proceed to step that operates to distribute I/O read requests across several 

192 to, using the Data Status Bitmap Table 2106, perform a 50 disk storage units maintaining mirrored versions of earlier 

logical OR of the updated fields of the mirroring disk storage written data, thereby allowing concurrent access by multiple 

units for each Disk Block, with that of the master disk. Thus, processors or servers. While a full and complete disclosure 

there will be no change in the Updated and Staled fields for of the embodiments of invention has been made, it will be 

Disk Block 2 of the master and mirror disk storage units 20 j obvious to those skilled in this art that various modifications 

and 2O2. However, since those fields are different for Disk 55 may be made. For example, if there are more processors than 

Block 1 (Updatcd-Y for Disk Block 1 of Disk 20^, and N for mirrored disk storage units, several processors can be 

Disk Block 1 of Disk 2O2), the fields will, in steps 192 and assigned to the same disk storage unit, while the other 

194, change to the values shown in the Data Status Bitmap processors enjoy exclusive use of other disk storage units. 

Table 200c shown in FIG. 16C. All Updated fields of Disk Also, the storage unit 14 can be configured to present to the 

20, arc set to N in Step 194, eo processors logical disk drive units, each logical disk storage 

Some time later, the sync daemon will proceed on its own unit mapping to physical disk storage units. That means a 

accord through steps 160, 162, 164, and 166 to locate those logical disk storage unit can be constructed by several 

mirrored disk storage units that need updating, as described physical disk storage units. For example, suppose that the 

above. Finding the Y in the Stale field of Disk Block 1, storage system comprises are two physical diskunhs x and 

addressDisk02, will effect copying of the updated data from 65 y. A logical volume may be configured by mapping the 

the master disk storage (Disk Block 1, address Disk 01) to address space of that logical volume to the concatenated 

the mirror storage. The Y will then be reset to an N. address space of the two physical disk units x and y. Another 
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example, is to have a logical volume that is mapped to a 9. A data processing system, including: 

concatenation of some portion of the disk x and some a number of processors; 

portion of the disk y. a storage system having a plurality of storage units, 

What is claimed is* including a master storage unit, the storage system 

1. A processing system having a plurality of disk units ' bemg oDmmuDicalively coupled to the number of pro- 

communicatively connected to two or more server proces- u* r • i j- 

, , , J J- . M 1 t"*^ number of processors includmg a mount manager 

sors by a stooge system a method of distnbutmg read access ^^^^^^^^ ^^-^^ ^^^^ ^^^^^^ processors 

to data stored on the plurality of disk units that includes the ^ corresponding one of the plurality of storage units; 

steps of. (jj^ storage system including a disk controller operable to 

identifying one of the plurality of disk units as a master ^^^^ ^^^^ number of processors to the master 

yjjjj disk unit and then copy the data to each of the corre- 

. . .r.t_ L 1- . 1 . spondingonesof the plurality of storage units, each of 

assigning each of the other of the plurality of disk units to ^^e number of processors reading data from the 

a corresponding one of the two or more server proces- ^-^^^^ ^^^^^^.^^ ^^^^^^^ 

- . 10. The data processing system of claim 9, including a bus 

writing data received from the two or more server pro- structure for communicatively connecting the mount man- 

cessors to the master disk unit; ager to the other of the number of processors, 

copying the data to the other of the plurality of disk units; 11. The data processing system of claim 9, wherein the 
receiving at the storage unit a request to read data from 20 master storage unit includes a first storage space for storing 

one of the other processor units to read data from the ^ predetermined amount of data, and each of the other of the 

one of the other of the plurality of disk units assigned plurahty of storage units having a second storage space for 

to the one processor unit and send the data to the one storing at least the predetermined amount of data, 

processor unit. 12. The data processing system of claim 9, wherein each 

2. The method of claim 1, including the steps of writing 25 of the plurality of storage units are physical disk elements, 
data to a first location of the master disk unit; and 13. The data processing system of claim 9, including a 

before the first daU is copied to a one of the other of the ^^^^ '^"'^^^^ accessible to the disk controller for identifying 

plurahty of disk units, receiving a request to read data Z . """'^ ^""^ '^''P'''* ^ 

from a location of the one disk unit corresponding to °^ , , . ^ 

the first location; and ^^^^ processmg system of claim 13, wherein the 

, diskcontrolleroperatestoconsultthedatastructure when an 

readmg the data from the first location of the master disk ^^.^ J-^ ^^^^^^^^ ^^^^ ^ processors to 

umt and sending the data to the server processor ^^^^ ^^^^ ^^^^ corresponding one of the storage units if 

3. TTie method of claun 1, mcluding the step of mauitain- ^^^^ ^ ^^^^^^ ^^^^^ and copied to 
ing at each of the server processors a mount point table 3^ ^^e one storage unit, otherwise to read the data from the 
identifymg the assigned disk unit for such server processor. master storage unit 

4. Tlie method of claim 1, including the step of designat- ^5 ^ ^^^^^^^ ^ ^ ^^^^^ ^^^^^ ^^^-^^^ 
mg a one of the two or more server processors as a mount ^^^^ ^^^^ ^^^^ respectively, 
manager responsible for creating and maintainmg a mount f^^^ ^ j^^^j- processor elements, including: 

points table that identifies which of the disk umts is assigned „ . . . . l ^ • . 
i u- u c *u 4 ^40a master storage unit and a number of mirrored storaee 

to which of the two or more server processors. . ^ "iv.'i « ovwi<i^^ 

units* 

5. The method of claim 1, including the step of detecting ' n . 

a failure of the assigned disk unitby a one of the two or more ^ controller that receives the I/O write requests to write 

server processors to send a message to the mount manager ^^.^^ "^^^^^ ^^""'^^^ ""^^ ^^^h of the 

for assignment of a replacement disk unit. mirrored storage units; 

6. The method of claim 1 including the steps of: ^^^^S ^° assignment of at least each of the mirrored 
providing the master disk unit with a number of disk ""f correspondmg ones of the plurality of 

oortions- processor elements, the controller receivmg an I/O read 

' ^, . ^ , , . request from a one of the processor elements to read 

providing each of the other of the plurahty of disk units d.^, f.^m the corresponding one of the storage units 

wuh corresponding disk portions; 50 assigned to such processor element. 

maintaining at the storage system a Data Status Bitmap 16. The data storage of claim 15, including a daU struc- 

Table to identify whether data written to a one of the ture accessible to the disk controller for identifying when 

disk portions of the master disk data has been copied to data is written to the master storage unit and copied to the 

the other of the plurality of disk units. other of the storage units, the disk controller receiving the 

7. The method of claim 6, wherein the writing step 55 \fQ read request to read data from the assigned mirrored 
includes modifying the Data Status Bitmap Table to indicate storage unit if data written to the master storage unit has 
that data written to the master disk unit has not been copied been copied to the assigned mirrored storage unit, else to 
to the other of the plurahty of disk units. read data from the master storage unit. 

8. The method of claim 7, wherein the copying step 17. The data storage of claim 16 wherein the master and 
includes changing the Data Status Bitmap Table for each of mirrored storage units are disk storage units. 

the other of the plurality of disk units to which the data is 

copied that the data has been copied thereto. ♦ * * ♦ * 
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